Columbia University TRECVID 2007 High-Level Feature Extraction
نویسندگان
چکیده
One difficulty in the HLF task this year was changing the applied domain from news video to foreign documentary videos. Classifiers trained in prior years performed poorly if naively applied, and classifiers trained on the 2007 data alone may suffer from too few positive training samples. This year we address this new fundamental problem how to efficiently and effectively adapt models learned from an old domain to a significantly different one. Investigation of this topic complements very well the scalability issue discussed in TRECVID 2006 how to leverage the resource of a large concept detector pool (e.g., Columbia 374) to improve accuracy of individual detectors. We developed and tested a new cross-domain SVM (CDSVM) algorithm for adapting previously learned support vectors from one domain to help classification in another domain. Performance gain is obtained with almost no additional computational cost. Also, we conduct a comprehensive comparative study of the state-of-the-art SVM-based cross-domain learning methods. To further understand the underlying contributing factors, we propose an intuitive selection criterion to determine which cross-domain learning method to use for each concept. Such a prediction mechanism is important since there are a multitude of promising methods for adapting old models to new domains, and thus judicious selection is a key to applying the right method under the right context (e.g., size of training data in new/old domains, variation of content between two domains, etc). Although there is no single method that universally outperforms other options, with adequate prediction mechanisms, we will be able to apply the right adaptation approach in different conditions, and demonstrate 22% performance improvement for mid-frequency or rare concepts. 2 Introduction There is a common issue for machine learning problems: the amount of available test data is large and growing, but the amount of labeled data is often fixed and quite small. Video data, labeled for semantic concept classification is no exception. For example, in high-level concept classification tasks (TRECVID [14]), new corpora may be added annually from unseen sources like foreign news channels or audio-visual archives. Ideally, one desires the same low error rates when reapplying models derived from previous source domain D to a new, unseen target domainD, often referred to as domain adaptation or cross-domain learning. Recently several different approaches has been proposed toward this direction in the machine learning society [4, 5, 17]. The high-level feature extraction task of TRECVID2007 provides a large amount of cross-domain data sets for evaluating and comparing these methods. In TRECVID2007, we try to tackle this challenging issue and make contributions in two folds. First, a new Cross-Domain SVM (CDSVM ) algorithm is developed for adapting previously learned support vectors from source D to help detect concepts in target D. Better precision can be obtained with almost no additional computational cost. Second, a comprehensive summary and comparative study of the state-of-the-art SVM-based cross-domain learning algorithms is given. By treating the TRECVID2007 data set as the target domain D and treating the TRECVID2005 data set as the source domain D, these algorithms are evaluated over the latest large-scale TRECVID benchmark data. Finally, a simple but effective criterion is proposed to determine if and which cross-domain method should be used. The rest of this paper is organized as follows. Section 3 gives an overview of many state-of-the-art SVMbased cross-domain learning methods, ordered in decreasing computational cost. Section 3.3.3 introduces our CDSVM algorithm. We also review the BCRF approach which explores the inter-concept relations. In section 4 we discuss our submissions for TRECVID2007 high-level feature extraction task and in section 5 we compare the performance of many cross-domain learning algorithms. Finally, in section 6 we provide experimental conclusions and next steps for research. 3 Approach Overview The cross-domain learning problem can be summarized as follows. Let D denote the target data set, which consists of two subsets: the labeled subset D l and the unlabeled subset D t u. Let (xi, yi) denote a data point where xi is a d dimensional feature vector and yi is the corresponding class label. In this work we only look at the binary classification problem, i.e., yi = {+1,−1}. In addition to D , we have a source data set D whose distribution is different from but related to that of D. A binary classifier f(x) has already been trained over this source data set D. Our goal is to learn a classifier f(x) to classify the unlabeled target subset D u. As D and D have different distributions, f(x) will not perform well for classifying D u. Conversely, we can train a new classifier f (x) based on D l alone, but when the number of training samples |D t l | is small, f (x) may not give robust performance. Since D is related to D, utilizing information from source D to help classify target D u should yield better performance. This is fundamental the motivation of cross-domain learning. In this section, we briefly summarize and discuss many state-of-the-art SVM-based cross-domain learning algorithms. 3.1 Standard SVM Applied in New Domain Without cross-domain learning, the standard Support Vector Machine (SVM ) [15] classifier can be learned based on the labeled subset D l to classify the unlabeled set D t u. Given a data vector x, SVMs determine the corresponding label by the sign of a linear decision function f(x) =wx+b. For learning non-linear classification boundaries, a kernel mapping φ is introduced to project data vector x into a high-dimensional feature space φ(x), and the corresponding class label is now given by the sign of f(x) =wφ(x)+b. The primary goal of an SVM is to find an optimal separating hyperplane that gives a low generalization error while separating the positive and negative training samples. This hyperplane is determined by giving the largest margin of separation between different classes, i.e. by solving the following problem: min w 1 2 ||w||22 + C XN t l i=1 ǫi (1) s.t. yiw T φ(xi)+b≥1−ǫi, ǫi≥0, ∀(xi, yi) ∈ D t l where ǫi is the penalizing variable added to each data vector xi; C determines how much error an SVM can tolerate. One very simple way to perform cross-domain learning is to learn new models over all possible samples, called Combined SVM in this paper. The primary motivation for this method is that when the size of data in target domain is small, the target model will benefit from a high count of training samples present in D and should therefore be much more stable than a model trained on D alone. However, there is a large time cost for learning with this method due to the increased number of training samples from |D| to |D|+|D|. 3.2 Transductive Localized SVM (LSVM) To decrease generalization error in classifying unseen data D u in the target domain, transductive SVM methods [5, 9] incorporate knowledge about the new test data into the SVM optimization process so that the learned SVM can accurately classify test data. The Localized SVM (LSVM ) tries to learn one classifier for each test sample based on its local neighborhood. Given a test data vector x̂j , we find its neighborhood in the labeled training set D t l based on similarity σ(x̂j ,xi), xi ∈D t l : σ(x̂j ,xi)= exp ( −β||x̂j − xi|| 2 2 ) . β controls the size of the neighborhood, i.e. the larger the β, the less influence each distant data point has. An optimal local hyper-plane is learned from test data neighborhoods by optimizing the following function: min w 1 2 ||w||2 + C ∑N l i=1 σ(x̂j ,xi)ǫi (2) s.t. yiw φ(xi)+b≥1−ǫi, ǫi≥0, ∀(xi, yi) ∈ D t l As the result, the classification of a test sample only depends on the support vectors in its local neighborhood. Transductive SVM approaches can be directly used for cross-domain learning by using D l ∪ D s to take the place of D l in Eqn.(2). Their major drawback is the computational cost, especially for large-scale data sets. 3.3 Cross-domain Adaptation Approaches In the cross-domain learning problem, the source data set D and the target data set D are highly related. The following cross-domain adaptation approaches investigate how to use source data to help classify target data. 3.3.1 Feature Replication Feature replication combines all samples from both D and D, and tries to learn generalities between the two data sets by replicating parts of the original feature vector, xi for different domains. This method has been shown effective for text document classification over multiple domains [8]. Specifically, we first zero-pad the dimensionality of xi from d to d(N−1) where N is the total number of adaptation domains, and in our experiments N=2 (one source and one target). Next we transform all samples from all domains as: x̂si =
منابع مشابه
Bilkent University at TRECVID 2007
We describe our fourth participation, that includes two high-level feature extraction runs, and one manual search run, to the TRECVID video retrieval evaluation. All of these runs have used a system trained on the common development collection. Only visual information, consisting of color, texture and edge-based low-level features, was used.
متن کاملBeyond Semantic Search: What You Observe May Not Be What You Think
This paper presents our approaches and results of the four TRECVID 2008 tasks we participated in: high-level feature extraction, automatic video search, video copy detection, and rushes summarization. In high-level feature extraction, we jointly submitted our results with Columbia University. The four runs submitted through CityU aim to explore context-based concept fusion by modeling inter-con...
متن کاملMSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search
This paper describes the MSRA-USTC-SJTU experiments for TRECVID 2007. We performed the experiments in high-level feature extraction and automatic search tasks. For high-level feature extraction, we investigated the benefit of unlabeled data by semi-supervised learning, and the multi-layer (ML) multi-instance (MI) relation embedded in video by MLMI kernel, as well as the correlations between con...
متن کاملTRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS
We participated in the high-level feature extraction task in TRECVID 2007. This paper describes the details of our system for the task. For feature extraction, we propose an EMD-based bag-of-feature method to exploit visual/spatial information, and utilize WordNet to expand semantic meanings of text to boost up the generalization of detectors. We also explore audio features and extract the moti...
متن کاملColumbia University/VIREO-CityU/IRIT TRECVID2008 High-Level Feature Extraction and Interactive Video Search
! A_CU-run6: local feature alone – average fusion of 3 SVM classification results for each concept using various feature representation choices. ! A_CU-run5: linear weighted fusion of A_CU-run6 with two grid-based global features (color moment and wavelet texture). ! A_CU-run4: linear weighted fusion of A_CU-run5 with a SVM classification result using detection scores of CU-VIREO374 as features...
متن کاملUHigh-Level Feature Detection with Forests of Fuzzy Decision Trees combined with the RankBoost Algorithm
In this paper, we present the methodology we applied in our submission to the NIST TRECVID’2007 evaluation. We participated in the High-level Feature Extraction task. Our approach is based on the use of a Forest of Fuzzy Decision Trees combined with the RankBoost algorithm. 1 Structured Abstract Summary Here we present the contribution of the University of Paris 6 at TRECVID 2007 [6]. It concer...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007